-
Notifications
You must be signed in to change notification settings - Fork 25
[PoC] Push simple filter conditions into $lookup stage. #345
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[PoC] Push simple filter conditions into $lookup stage. #345
Conversation
129893f
to
3e685a8
Compare
Does it address #309? Do we need some additional tests for other cases besides the changed test? I'll have to take it on faith that this does the right thing, because it would take me a lot of time to understand this code. Some comments explaining variable names like |
It fixes #309, but there’s room for improvement. It tries to push some conditions into the Could it be better? Yes. We could create a Should it have more tests? Maybe, but we didn’t add extra tests for previous join strategies either. It passes the existing Django join tests. EDIT: On the other hand, this is a PoC. If it seems to work in the performance test, it could be implemented. |
3e685a8
to
764cd51
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine with the current PoC addition without tests. However, if we can get a proper design of the optimization outside of this patch-fix, I'd say then we need to write out a proper test suite.
if ( | ||
isinstance(expr, Lookup) | ||
and isinstance(expr.lhs, Col) | ||
and (is_direct_value(expr.rhs) or isinstance(expr.rhs, Value)) | ||
): | ||
promote_filters[expr.lhs.alias].append(expr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Out of scope for this PR, but we can definitely have two options:
- If it's a direct_value
Directly promote the filter
- If it's a sub-expression
convert the sub-expression and then place it in the
promote_filters
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's the idea.
Now the conditions are pushed up (not sure if that's the right term, but I mean passing the filter into the $lookup
) only if the Django lookup looks like one of the following:
Column operator value
Column operator Column
← this one might actually be broken in the current code 😬
What the code does now is:
It filters out all conditions that involve having
, subqueries, or composed conditions like A and (B or C)
.
Why?
For subqueries, we need to promote or isolate certain $lookup
s in the pipeline.
For composed conditions, we need to analyze the expression tree to determine which parts can be pushed up. For example, in A and (B or C)
, if A
, B
, and C
are atomic and B
and C
can filter the outer collection, then we can push up the filter B or C
.
@@ -165,18 +152,47 @@ def join(self, compiler, connection): | |||
# based on their rerouted positions in the join pipeline. | |||
replacements = {} | |||
for col, parent_pos in columns: | |||
column_target = Col(compiler.collection_name, expr.output_field.__class__()) | |||
target = col.target.clone() | |||
target.remote_field = col.target.remote_field |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this not cloned from clone()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like to answer yes, but the ManyToMany field isn't well cloned, because of some initialization, so I have to copy the remote_field.
lookup_pipeline = [] | ||
lhs_fields = [] | ||
rhs_fields = [] | ||
# Add a join condition for each pair of joining fields. | ||
parent_template = "parent__field__" | ||
for lhs, rhs in self.join_fields: | ||
lhs, rhs = connection.ops.prepare_join_on_clause( | ||
self.parent_alias, lhs, compiler.collection_name, rhs | ||
) | ||
lhs_fields.append(lhs.as_mql(compiler, connection)) | ||
# In the lookup stage, the reference to this column doesn't include | ||
# the collection name. | ||
rhs_fields.append(rhs.as_mql(compiler, connection)) | ||
# Handle any join conditions besides matching field pairs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this the portion of code that generates the code shown below?
{
$lookup: {
...
$pipeline: [
...
{ $and: [
...,
{$eq: ['$$parent_field_0', '$_id']} # This line?
]
}
]
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it is the same code as before, I only created a function because the process applied to extra
is applied to the new filters
if extra: | ||
replacements = _get_reroot_replacements(extra.leaves()) | ||
extra_condition = [extra.replace_expressions(replacements).as_mql(compiler, connection)] | ||
else: | ||
extra_condition = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm assuming these are the expressions that we aren't able to easily convert yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, that was old code, some lookups have extra conditions and others haven't , I did nothing new with extra
. So if no extra, no extra conditions 😄
In this PR we tackle the slow join pushing the simple conditions inside the $lookup. the condition is also fix a bug.
New query:
Previous query